Low-Cost, High-Performance Barrier Synchronization on Networks of Workstations
نویسندگان
چکیده
structions between synchronizations). Barriers are important synchronization operations in multiprocessor systems [5, 19]. When a processor executes a barrier instruction, it first checks in at the barrier to indicate to the other processors that it has arrived at the specified synchronization point. It then must wait for all other processors participating in the barrier to check in, after which all processors can proceed past the barrier to begin executing their next assigned tasks. Tasks may include parallel loop iterations [16], emulation of very long instruction words (VLIW) [14], or dataflow steps [11]. Current barrier implementations typically use software trees [18], softwareor hardware-based counters [12, 13, 17], or hardware fan-in trees [3, 5, 6, 19, 20]. While the software trees are inexpensive, their synchronization delay is too long for fine-grained applications, especially on systems lacking a shared-bus. Counter-based methods introduce potentially high contention to access the shared counters, which can lead to unacceptably long, and unpredictable, synchronization delays. However, the best hardware solutions are not easily adapted to a wide range of existing systems, but depend on a particular topology, or even a particular implementation. The fan-in tree is an example of a class of hardware solutions with very good performance. The hardware required to implement the best techniques is often complex and expensive, requiring O(N 2) wiring complexity [5, 20] for the fastest performance, a severe problem on networks of workstations. Slow and expensive associative memory may also be required [20]. Note that fine granularity precludes using barriers in a multiprogramming environment since a taskswitch by one PE would delay any barrier(s) in which it is involved until that PE resumes execution. Extremely fast context-switching [1, 8] can be used to mitigate this problem. The barrier mechanism proposed in this paper is inexpensive, using only simple bit-serial hardware in each PE with only a single-conductor serial ring between PEs. Since the ring itself has no clocked sequential logic, the only latency introduced by the barrier hardware is a single gate JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 40, 131–137 (1997) ARTICLE NO. PC961273
منابع مشابه
Improving LoRaWAN Performance Using Reservation ALOHA
LoRaWAN is one of the new and updated standards for IoT applications. However, the expected high density of peripheral devices for each gateway, and the absence of an operative synchronization mechanism between the gateway and peripherals, all of which challenges the networks scalability. In this paper, we propose to normalize the communication of LoRaWAN networks using a Reservation-ALOHA (R-A...
متن کاملEecient Collective Communication on Heterogeneous Networks of Workstations 1
Networks of Workstations (NOW) have become an attractive alternative platform for high performance computing. Due to the commodity nature of workstations and interconnects, the NOW environments are being gradually redeened as Heterogeneous Networks of Workstations (HNOW) environments. This paper presents a new framework to implement collective communication operations (as deened by the Message ...
متن کاملEecient Collective Communication on Heterogenous Networks of Workstations Eecient Collective Communication on Heterogeneous Networks of Workstations 1
Networks of Workstations (NOW) have become an attractive alternative platform for high performance computing. Due to the commodity nature of workstations and interconnects, the NOW environments are being gradually redeened as Heterogeneous Networks of Workstations (HNOW) environments. This paper presents a new framework to implement collective communication operations (as deened by the Message ...
متن کاملDsm Overview
Computer and soft igh-speed networks and improved microprocessor performance are making networks of workstations an appealing vehicle for parallel computing. By relying solely on commodity hardware ware, networked workstations can offer parallel processing at a relatively low cost. Anetwork-of-workstations multiprocessor can be realized as a processor bank in which dedicated processors provide ...
متن کاملHeterogeneous Networks of Workstations across Wide Area Networks Be Accepted in Partial Fulfillment of the Requirements for the Degree of Bachelor of Science in Computer Engineering
Networks made up of various systems are scattered across wide area networks and together they contribute to the heterogeneous environment of the computational grid. Whilst they are an immense source of computing resource, the core weakness of connecting these networks blindly together is that they are made up of various network link speeds. Bottlenecks in communications occur due to the varied ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Parallel Distrib. Comput.
دوره 40 شماره
صفحات -
تاریخ انتشار 1997